Add KimiK2Detector with tool interruption support by JustinTong0323 · Pull Request #19696 · sgl-project/sglang

JustinTong0323 · 2026-03-02T13:40:30Z

Summary

Add a dedicated KimiK2Detector for Kimi K2 models that handles the <|tool_calls_section_begin|> token as an implicit end-of-thinking marker, allowing the model to switch from reasoning to tool-call sections without first emitting </think>.
Replace the previous kimi_k2 → Qwen3Detector mapping with the new KimiK2Detector.
Fix issue (rename apply -> apply_weights in CompressedTensorsMxInt4MoE) introduced by [Blackwell] Make mxint4 flashinfer_trtllm moe gemm set by default on blackwell #18136 and [2/N] Quantization Refactor: Compressed tensors MoE schemes #17503 (comment), as which blocks the server launching:

...
sglang/srt/layers/quantization/compressed_tensors/compressed_tensors.py", line 680, in get_moe_scheme
    return CompressedTensorsMxInt4MoE(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Can't instantiate abstract class CompressedTensorsMxInt4MoE without an implementation for abstract method 'apply_weights'

Test plan

Added unit tests for KimiK2Detector initialization, non-streaming parsing with tool interruption, streaming parsing with tool interruption, and post-interruption normal text handling.
Added integration tests through ReasoningParser API for both streaming and non-streaming Kimi K2 tool interruption scenarios.
python -m pytest test/registered/parser/test_reasoning_parser.py passes.

gemini-code-assist · 2026-03-02T13:40:51Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the handling of Kimi K2 models by introducing a dedicated KimiK2Detector. This new detector improves the parsing of model outputs, particularly by enabling more flexible and efficient tool call interruptions. It streamlines the transition from reasoning to tool-calling sections, enhancing the model's ability to interact with external tools. Additionally, it simplifies the server's model-specific adjustments by removing outdated or redundant logic for Kimi K2/K2.5 quantization and MoE backend selection.

Highlights

New Detector for Kimi K2 Models: Introduced KimiK2Detector to specifically handle Kimi K2 models' reasoning format, including implicit tool interruption.
Enhanced Tool Interruption Handling: The new detector recognizes <|tool_calls_section_begin|> as an immediate end to the reasoning phase, allowing seamless transition to tool calls without requiring an explicit </think> token.
Updated Model Mapping: The kimi_k2 model is now correctly mapped to the new KimiK2Detector within the ReasoningParser.
Simplified MoE Backend Logic: Removed specific detection and auto-selection logic for Kimi K2/K2.5 int4 compressed-tensors related to the flashinfer_trtllm MoE backend, as it is no longer necessary.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/srt/parser/reasoning_parser.py
- Added KimiK2Detector class, inheriting from BaseReasoningFormatDetector, to support Kimi K2 models with tool interruption.
- Updated the ReasoningParser's _detector_map to use KimiK2Detector for the "kimi_k2" model.
python/sglang/srt/server_args.py
- Removed the is_kimi_k2_k25_thinking_int4 variable and its associated logic for detecting specific Kimi K2/K2.5 int4 compressed-tensors.
- Modified the MoE runner backend auto-selection to no longer consider is_kimi_k2_k25_thinking_int4 for flashinfer_trtllm.
test/registered/parser/test_reasoning_parser.py
- Imported KimiK2Detector.
- Added TestKimiK2Detector class with unit tests for initialization, non-streaming parsing with tool interruption, streaming parsing with tool interruption, and handling normal text after interruption.
- Extended test_init_valid_model to include kimi_k2 mapping to KimiK2Detector.
- Added test_kimik2_tool_interruption to TestReasoningParser for integration testing of Kimi K2 tool interruption.

Activity

Unit tests were added for KimiK2Detector initialization, non-streaming parsing with tool interruption, streaming parsing with tool interruption, and post-interruption normal text handling.
Integration tests were added through the ReasoningParser API for both streaming and non-streaming Kimi K2 tool interruption scenarios.
All existing and new unit tests for test_reasoning_parser.py passed successfully.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a dedicated KimiK2Detector to handle tool interruptions for Kimi K2 models, which is a great addition. The implementation is clean, and the related cleanup in server_args.py improves code clarity. The accompanying unit and integration tests are thorough and cover various scenarios, ensuring the new detector works as expected. I have one suggestion to refactor some duplicated test logic to improve maintainability.

- Implemented KimiK2Detector class to handle reasoning format with tool-call sections. - Updated ReasoningParser to include KimiK2Detector. - Added unit tests for KimiK2Detector covering initialization, tool interruption detection, and streaming parsing scenarios. Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

…oE class for abstract method implementation Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

…nking

JustinTong0323 · 2026-03-02T14:31:20Z

/tag-and-rerun-ci

kpham-sgl

LGTM

Summarizes the probabilistic tool call parsing failure when Kimi-K2.5 skips </think> token and directly emits <|tool_calls_section_begin|>. Includes root cause analysis, SGLang fix reference (PR sgl-project#19696), and questions for the Kimi team regarding expected token generation behavior. References: sgl-project#18086 https://claude.ai/code/session_01GnsZFpmtACr5U3Wkc7MZ2Z

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

gemini-code-assist Bot reviewed Mar 2, 2026

View reviewed changes

Comment thread test/registered/parser/test_reasoning_parser.py

JustinTong0323 added 3 commits March 2, 2026 14:22

fix: rename apply method to apply_weights in CompressedTensorsMxInt4M…

a7d88ef

…oE class for abstract method implementation Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

lint

1e252cb

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

JustinTong0323 force-pushed the fix/kimi-k25-implicit-tool-end-token-end-thinking branch from 2c1c342 to 1e252cb Compare March 2, 2026 14:29

JustinTong0323 requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg, HaiShaw and ch-wan as code owners March 2, 2026 14:29

Merge branch 'main' into fix/kimi-k25-implicit-tool-end-token-end-thi…

32f470d

…nking

github-actions Bot added the run-ci label Mar 2, 2026

JustinTong0323 mentioned this pull request Mar 2, 2026

Revert "[Blackwell] Make mxint4 flashinfer_trtllm moe gemm set by default on blackwell" #19695

Closed

b8zhong mentioned this pull request Mar 3, 2026

Fix CompressedTensorsMxInt4MoE abstract method and relax GPQA baseline #19726

Merged

2 tasks

kpham-sgl approved these changes Mar 3, 2026

View reviewed changes

ispobock approved these changes Mar 3, 2026

View reviewed changes

ispobock merged commit dbf1247 into sgl-project:main Mar 3, 2026
181 of 204 checks passed

JustinTong0323 mentioned this pull request Mar 3, 2026

[Bug] Kimi-k2.5 tool call parser may not work probabilistically. #18086

Closed

5 tasks

AMD-yanfeiwang pushed a commit to AMD-yanfeiwang/sglang that referenced this pull request Mar 3, 2026

Add KimiK2Detector with tool interruption support (sgl-project#19696)

76b1a76

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

AMD-yanfeiwang pushed a commit to AMD-yanfeiwang/sglang that referenced this pull request Mar 3, 2026

Add KimiK2Detector with tool interruption support (sgl-project#19696)

760fd3a

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Kangyan-Zhou pushed a commit to Kangyan-Zhou/sglang that referenced this pull request Mar 4, 2026

Add KimiK2Detector with tool interruption support (sgl-project#19696)

046d1af

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026

Add KimiK2Detector with tool interruption support (sgl-project#19696)

c520357

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

Add KimiK2Detector with tool interruption support (sgl-project#19696)

d0db57e

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

JustinTong0323 added a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

Add KimiK2Detector with tool interruption support (sgl-project#19696)

32c787b

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KimiK2Detector with tool interruption support#19696

Add KimiK2Detector with tool interruption support#19696
ispobock merged 4 commits into
sgl-project:mainfrom
JustinTong0323:fix/kimi-k25-implicit-tool-end-token-end-thinking

JustinTong0323 commented Mar 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

JustinTong0323 commented Mar 2, 2026

Uh oh!

kpham-sgl left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JustinTong0323 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

JustinTong0323 commented Mar 2, 2026

Uh oh!

kpham-sgl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JustinTong0323 commented Mar 2, 2026 •

edited

Loading